14 research outputs found
Incremental file reorganization schemes
Issued as Final project report, Project no. G-36-66
Design and implementation of a time warp parallel database system
Issued as Report, Project C-36-68
Efficiency and Security Trade-Off in Supporting Range Queries on Encrypted Databases
The database-as-a-service (DAS) model is a newly emerging computing paradigm,
where the DBMS functions are outsourced. It is desirable to store data on database
servers in encrypted form to reduce security and privacy risks since the server may not
be fully trusted. But this usually implies that one has to sacrifice functionality and
efficiency for security. Several approaches have been proposed in recent literature
for efficiently supporting queries on encrypted databases. These approaches differ
from each other in how the index of attribute values is created. Random one-to-one
mapping and order-preserving are two examples. In this paper we will adapt a prefix-
preserving encryption scheme to create the index. Certainly, all these approaches look
for a convenient trade-off between efficiency and security. In this paper we will discuss
the security issues and efficiency of these approaches for supporting range queries on
encrypted numeric data
Image Mining: A New Approach for Data Mining
We introduce a new focus for data mining, which is concerned
with knowledge discovery in image databases. We expect all
aspects of data mining to be relevant to image mining but in
this first work we concentrate on the problem of finding
associations. To that end, we present a data mining algorithm
to find association rules in 2-dimensional color images. The
algorithm has four major steps: feature extraction, object
identification, auxiliary image creation and object mining.
Our algorithm is general in that it does not rely on any type
of domain knowledge. A synthetic image set containing
geometric shapes was generated to test our initial algorithm
implementation. Our experimental results show that image
mining is feasible. We also suggest several directions for
future work in this area
The Sensible Sharing Approach to a Scalable, High-Performance Database System
Exploiting parallelism has become the key to building high-performance database
systems. Several approaches to building database systems that support both
inter and intra-query parallelism have been proposed. These approaches can be
broadly classified as either Shared Nothing (SN) or Shared Everything (SE).
Although the SN approach is highly scalable, it requires complex data
partitioning and tuning to achieve good performance whereas the SE approach
suffers from non-scalability. We propose a sensible sharing approach which
combines the advantages of both SN and SE. We propose an architecture, and
data partitioning and scheduling strategies that promote sensible sharing.
We analyze the performance and scalability of our approach and compare with
that of a SN system. We find that for a variety of workloads and data skew
our approach performs and scales at least as well as a SN system that uses the
best possible data partitioning strategy
Avoiding Conflicts between Reads and Writes Using Dynamic Versioning
In this paper, we discuss a new approach to multi-version concurrency control,
called Dynamic Versioning, that avoids the data contention due to conflicts
between Reads and Writes. A data item is allowed to have several committed
versions and at most one uncommitted version. A conflict between a Read and a
Write is resolved by imposing an order between the requesting transactions,
and allowing the Read to access one of the committed versions. The space
overhead is reduced to the minimum possible by making the versions dynamic; a
version exists only as long as it may be accessed by an active transaction.
Conditional lock compatibilities are used for providing serializable access to
the multiple versions. The results from simulation studies indicate that the
dynamic versioning method, with little space overhead (about 1\% the size of
the database), significantly reduces blocking (by 60\% to 90\%) compared to
single-version two-phase locking. Lower blocking rates increase transaction
throughput and reduce variance in transaction response times by better
utilization of resources. This approach also reduces starvation of short
transactions and subsumes previous methods proposed for supporting long-running
queries. The dynamic versioning method can be easily incorporated into existing
DBMS systems. The modifications required for the lock manager and the storage
manager modules to implement dynamic versioning are discussed
Shadow Logging - An Efficient Transaction Recovery Method
In this paper, we present LU-Logging, an efficient transaction
recovery method. The method is based on (flexible-redo/minimal-undo)
algorithm. The paper describes an implementation which avoids the
overheads of deferred updating used in previous no-undo implementations.
An update by a transaction to a data record does not immediately update the
data record. Instead, it generates a redo log record and associates it with
the data page. Each page in the data base has an associated log page, which
contains the still-uncommitted log records of the updates for the data page.
The log page is read from and written to the disk
along with the corresponding data page. This gives the flexibility of applying
the redo log records any time after the transaction commits, in particular
when the data page is read by another transaction. We call this updating as
lazy. For aborted transactions the redo log records are just discarded.
Simulation studies show that the overhead during normal transaction processing
for LU-Logging is comparable to that of traditional logging. The
crash recovery time is shown to be an order of magnitude faster than that for
traditional logging
Evolution in Data Streams
Conventional data mining deals with static data stored on disk, for example, using the current state of a data warehouse. In addition, the data may be read muliple times to accomplish the mining task. Recently, the data stream paradigm has become the focus of study, where data is continuously arriving as a sequence of elements and the data mining task has to be done in a single pass. An example is to construct a model(s) of the data as in clusitering or classification in a single pass and with limited memory. Data arrives as one of multiple potentially infinite streams under the data stream model. Data streams can flow at variable rates and the underlying models often change with time. The current work in data stream mining does not focus on change ("evolution") and that is precisely our main focus. Monitoring the changes in the models becomes as important as objeaining the models. Therefore, stream data mining not only needs to mine data incrementally and decrementally (in order to keep track of recent data), but also has to provide methods to monitor/detect the changes of underlying modesl. We consider this problem as "data evolution." Of equal importance, the mining algorithms themselves need to be adaptive/dynamic when the flow rate of data streams change dramatically. That is, the algorithms should be able to downgrade accuracy in order to handle a data burst, or to do a more thorough analysis when data flow is slow. We consider this problem as "algorithm evolution." We will study both data evolution and algorithm evolution. We will provide efficient algorithms to incrementally/decrementally mine stream data, good techniques to store data models and detect/monitor the changes, and a set of algorithms that can switch from "high resolution" to "low resolution" in order to adapt to the flow rate
An Efficient Algorithm for Mining Association Rules in Large Databases
Mining for association rules between items in a large database of sales transactions has been described as an important database mining problem. In this paper we present an efficient algorithm for mining association rules that is fundamentally different from known algorithms.
Compared to the previous algorithms, our algorithm reduces
both CPU and I/O overheads. In our experimental study it
was found that for large databases, the CPU overhead was
reduced by as much as a factor of seven and I/O was reduced
by almost an order of magnitude. Hence this algorithm is
especially suitable for very large size databases. The
algorithm is also ideally suited for parallelization. We
have performed extensive experiments and compared the
performance of the algorithm with one of the best existing
algorithms
Adaptive and Automated Index Selection in RDBMS
We present a novel approach for a tool that assists the database administrator
in designing an index configuration for a relational database system. A new
methodology for collecting usage statistics at run time is developed which
lets the optimizer estimate query execution costs for alternative index
configurations. Defining the workload specification required by existing
index design tools may be very complex for a large integrated database system.
Our tool automatically derives the workload statistics. These statistics are
then used to efficiently compute an index configuration. Execution of a
prototype of the tool against a sample database demonstrates that the proposed
index configuration is reasonably close to the optimum for test query sets